---
id: qa-agent-defining-an-evaluation-criteria
title: Defining an Evaluation Criteria
sidebar_label: Defining an Evaluation Criteria
---

In the previous section, we generated an entire synthetic QA dataset in just a few seconds. In this section, we'll be using some of those queries to observe how our RAG QA Agent responds, and carve out our what factors we care about in our QA agent's response. This will help us narrow down our evaluation criteria, which we'll need in order to select the right metrics for evaluation.

## Revisiting the Generated User Queries

Let's examine the first three golden generations and generate responses for them. Since we're only examining three user queries, you can simply copy and paste these `input`s from Confident AI into your codebase.

```python
input_1 = "Examine CloudMate's unique enterprise offerings: comprehensive compliance tools and dedicated support services."
input_2 = "If CloudMate introduces an Ultra plan, doubling current storage and security, what pricing might be expected?"
input_3 = "How does MadeUpCompany's adherence to GDPR, HIPAA, and SOC 2 ensure data protection?"
```

:::note
As you can see, these synthetic questions are great because they challenge your QA agent on edge cases and go beyond asking simple questions. For example, `input_2` asks about the pricing of a potential new plan offering.
:::

## Generating QA Agent Responses for Inspection

We'll pass these `input`s into our QA agent and see how it responds:

```python
input_1 = "Examine CloudMate's unique enterprise offerings: comprehensive compliance tools and dedicated support services."
print(qa_agent.generate(input_1))
# CloudMate offers a secure and scalable cloud storage solution designed for businesses
# of all sizes. Its enterprise features include comprehensive compliance tools such as
# military-grade encryption, role-based access control, and multi-factor authentication.
# Additionally, CloudMate provides dedicated support services, including priority customer
# assistance and a dedicated account manager for enterprise clients.
```

The QA agent answered our first input correctly, providing relevant details about CloudMate's compliance tools and dedicated support services.

```python
input_2 = "Examine CloudMate's unique enterprise offerings: comprehensive compliance tools and dedicated support services."
print(qa_agent.generate(input_2))
# The CloudMate Ultra plan will provide unlimited storage and top-tier security features
# at a fixed cost of $99.99/month.
```

The QA agent provided a speculative response about a potential Ultra plan. While the answer is reasonable, it lacks a factual basis unless CloudMate has officially announced such a plan.

```python
input_3 = "How does MadeUpCompany's adherence to GDPR, HIPAA, and SOC 2 ensure data protection?"
print(qa_agent.generate(input_3))
# MadeUpCompany ensures data protection through strict adherence to GDPR, HIPAA, and SOC 2 standards.
# This includes robust encryption protocols, access control mechanisms, and regular security audits
# to safeguard sensitive data.
```

While the response is generally relevant to data protection, it incorrectly assumes knowledge about MadeUpCompany when the QA agent is meant to provide answers specifically about CloudMate. Instead, the system should have either declined to answer or clarified its limitations.

## Defining the Evaluation Criteria

By reviewing these three responses, we can start to understand what we care about in our QA agent's replies. While the tone could be more personable, the main issue here is that some of the answers are **speculative** when asked questions outside the scope of the company's informational knowledge context, and some are **irrelevant**. These values we care about form our evaluation criteria.

:::info
As you inspect more responses from your QA agent, your evaluation criteria will expand, especially when it comes to product and user feedback.
:::

For now, we'll focus on two key criteria:

1. The answers must be _non-speculative_.
2. The answers must be _relevant_.

:::note
In the next section, we’ll explore how these evaluation criteria help in selecting the right metrics for assessing our QA agent.
:::
